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The introduction and the spread of alien invasive species are a worldwide phenomenon causing global eco¬ 
logical and economic damages. Among the invaders, alien macrocrustaceans are known to be very successful 
invertebrates that colonise new habitats rapidly. Data from different fresh and brackish waters gathered by 
the Flemish Environment Agency (VMM) were used to build data-driven models predicting habitat prefer¬ 
ence, abundance and species richness of alien macro-Crustacea present in surface waters in Flanders. Differ¬ 
ent techniques such as regression and classification trees in combination with several optimisation methods 
(e.g. pruning) were used to construct the models. The performance of the models was moderate, because a 
balance between performance, ecological relevance and complexity was strived for. When using a three¬ 
fold cross validation it was found that the variation between the folds was limited, which is an indication 
of the robustness and the good reliability of the constructed models. Based on a sensitivity analysis the im¬ 
portance of conductivity, Kjeldahl nitrogen and shipping were stressed as well as graphically illustrated. 
Alien macrocrustaceans were predicted as present under brackish water conditions as well as in fresh waters 
with intensive ship traffic and low levels of organic pollution. The alien species richness was higher in rivers 
with intensive ship traffic and increased with increasing conductivity. Especially in brackish waters, alien 
macrocrustaceans reached high abundances. In fresh water, the abundance of alien species was generally 
lower. An integrated model that combined our habitat suitability model with a water quality model was 
used to predict the future distribution of alien macrocrustaceans. The predictions indicated that the preva¬ 
lence and the species richness of alien macrocrustaceans are likely to increase with improving chemical 
water quality, whereas their abundance will probably decrease slightly. From our analysis, it is clear that 
models are a useful tool and that decision makers should focus on vulnerable areas such as brackish water 
areas and areas with intensive ship traffic in order to prevent the further introduction and spread of alien 
species. 

© 2012 Elsevier B.V. All rights reserved. 


1. Introduction 

Rates of species' introductions are increasing globally as a conse¬ 
quence of increasing trade in the world (Vitousek et al, 1997). Unin¬ 
tentional intercontinental transport via ballast water and hull fouling of 
ships are key introduction vectors of aquatic species (Hulme et al., 
2008). Changing environmental conditions, habitat degradation and the 
interconnection of waterways connecting previously separated biogeo¬ 
graphic regions all promote the establishment and spread of alien species 
worldwide (Bij de Vaate et al., 2002; Boets et al., 2011a). The introduction 
of alien invasive species often has negative influences on native commu¬ 
nities and ecosystems, with consequences such as species loss, biotic ho¬ 
mogenization and changes in nutrient cycling (Gurevitch and Padilla, 
2004; MacNeil et al., 2011). Therefore, techniques for modelling species' 


* Corresponding author. Tel.: +32 472521819. 

E-mail address: pieter.boets@ugent.be (P. Boets). 

1574-9541/$ - see front matter © 2012 Elsevier B.V. All rights reserved, 
doi: 10.1016/j.ecoinf.2012.06.001 


potential distributions could support pro-active strategies to avoid the in¬ 
troduction of alien species or to help in risk analysis by revealing those re¬ 
gions which are seen as hotspots for alien species introductions (Ba et al., 
2010; Giovanelli et al., 2008; Worner and Gevrey, 2006). Besides their 
power to predict the future distribution of alien species, models can be 
used to assess the impact of alien invasive species on native species as¬ 
semblages (Jaarsma et al., 2007). In this way, negative effects of alien in¬ 
vasive species on the environment (e.g. food web disturbance or habitat 
alteration) as well as on the economy (e.g. high costs for eradication 
and control) could be reduced and taclded in advance. Being able to de¬ 
termine which habitats are vulnerable for invasions is essential for a 
good management, because it is often either impossible or at least very 
expensive to eradicate alien invasive species after their establishment 
(Perrings et al., 2005). 

Using conventional statistical multivariate methods to analyse 
data poses limits, because they are mainly applicable to linear data 
and have less flexibility in interpreting ecological data. Integrative 
and adaptive models that cover the non-linearity in a system are 
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envisioned in information processing in ecology (Park and Chon, 
2007). Machine learning methods offer an advantage over traditional 
analysis techniques, because they do not introduce any prior assump¬ 
tions about the relationship between variables (Dzeroski and Drumm, 
2003). Several data-driven modelling techniques such as Decision 
Trees, Artificial Neural Networks, Bayesian Belief Networks and Sup¬ 
port Vector Machines have been proven to be successful in predicting 
the presence and distribution of species in aquatic ecosystems 
(Adriaenssens et al., 2004; Boets et al., 2010; Dominguez-Granda 
et al., 2011; Everaert et al., 2011; Goethals et al., 2007; Hoang et al., 
2010). Selection of modelling techniques may be based on specific 
study objectives or the format of response variables (i.e. presence- 
absence versus abundance) and the availability and resolution of pre¬ 
dictor variables such as climatic, physical-chemical and land-use 
data, which can be related to a species' occurrence and abundance. 
However, the selection among different modelling approaches is some¬ 
times based on cost or convenience (Stohlgren et al., 2010). In this 
study, we opted to use classification and regression trees to predict the 
presence, abundance and species richness of alien macrocrustaceans in 
surface waters in Flanders, since these techniques are widely applied 
and yield results that are easy to interpret (Boets et al., 2010; Dakou 
et al., 2007; Everaert et al., 2011; Kampichler et al., 2010; Vaclavik and 
Meentemeyer, 2009). 

In this paper, we focused on alien macrocrustaceans since these are 
widespread and represent, together with molluscs, the most important 
share of alien macroinvertebrates in many rivers across Europe 
(Bernauer and Jansen, 2006; Boets et al., 2011a; Messiaen et al., 2010; 
Nehring, 2006). Alien macrocrustaceans are often very successful in 
their new habitat. Their intrinsic characteristics, such as a short genera¬ 
tion time, rapid growth with early sexual maturity, high fecundity and 
their euryhaline and omnivorous character make them extremely suit¬ 
able for rapid expansion and establishment in freshwater ecosystems 
(Bij de Vaate et al., 2002). Several alien species belonging to the 
macrocrustaceans such as Dikerogammarus villosus or Procambarus 
clarkii are known to have an impact on native as well as alien biota 
(Boets et al., 2009, 2010). If we want to prevent the introduction and 
to reduce the impact of alien macrocrustaceans, a strict policy is needed. 
In this context, the models could help to support decision-making in 
water management by inducing measures for those regions which are 
at high risk. 

In the present study, our goal was: (1) to predict in which habitats 
alien macrocrustaceans are likely to establish, (2) to determine which 
parameters positively or negatively influence the species richness of 
alien macrocrustaceans, (3) to assess which environmental condi¬ 
tions are favourable for alien macrocrustaceans to build up high den¬ 
sities and become dominant and (4) to make predictions on the 
future distribution of alien macrocrustaceans based on an integrated 
modelling approach. For the latter, habitat suitability models were 
combined with water quality models, which predict changes in chem¬ 
ical water quality due to the installation of planned wastewater treat¬ 
ment plants. 

2. Materials and methods 

2.1. Data collection 

The dataset consisted of biological and chemical data collected by the 
Flemish Environment Agency (WM). Since 1989, they monitor the 
water quality at more than 2,500 sampling locations scattered over dif¬ 
ferent water bodies in Flanders (northern part of Belgium). In this way, 
a lot of data on macroinvertebrates as well as physical-chemical param¬ 
eters was available. The model development was based on all samples 
collected during the year 2004, because this year was characterised 
by intensive sampling (over 800 samples) spread over Flanders. 
Macroinvertebrates were collected by standard handnet sampling 
or via artificial substrates if it was not possible to use the kick 


sampling method (Gabriels et al., 2010). Macroinvertebrates were 
identified to the level needed for the calculation of the Multimetric 
Macroinvertebrate Index Flanders (MMIF; Gabriels et al., 2010). Indige¬ 
nous and alien species can belong to the same family and therefore, it 
was not clear from the VMM database if alien macrocrustaceans oc¬ 
curred in the samples. Since we wanted to make predictive models for 
alien macrocrustaceans, only species belonging to this group were iden¬ 
tified to species level. In this way, data on the presence/absence, the 
abundance and the species richness of alien macrocrustaceans present 
per sampling location was available. Conductivity, pH and dissolved ox¬ 
ygen were always measured in the field during macroinvertebrate sam¬ 
pling. Other chemical parameters were retrieved from monitoring data. 
As the chemical monitoring, which was usually performed on a monthly 
basis, was not carried out simultaneously with the macroinvertebrate 
sampling, measurements from the last date before macroinvertebrate 
sampling were used. The slope of a watercourse was determined 
based on the difference in height between two points 1000 m apart, 
using GIS-software (version 9.3.1) applied on the Flemish Hydrographic 
Atlas (AGIV, 2006). The same data were used to determine the sinuosity 
on a stretch of 100 m. River morphology was evaluated based on pic¬ 
tures of the sampling sites: pool-riffle pattern and meandering were 
both quoted from 0 (absent) to 5 (well developed) and summed, 
which yielded a score from 0 to 10. Information on the number of pass¬ 
ing ships on navigable waterways originated from the annual reports by 
nv De Scheepvaart and the River Information Services (RIS). For each 
sampling point, it was indicated whether ships passed or not and if so, 
how many ships passed on an annual base (based on the report of the 
year 2009). The complete dataset consisted of three response variables 
(presence/absence, number of alien macrocrustaceans and abundance 
of alien macrocrustaceans) and 16 predictor variables, two of which 
were discrete and 14 continuous (Table 1). 


2.2. Model development and validation 

Two types of decision trees were used to construct the models: clas¬ 
sification and regression trees. A decision tree is called a classification 
tree (CT) if the response variable is qualitative (e.g. presence/absence 
of alien macrocrustaceans) and a regression tree (RT) if the response 
variable is quantitative (e.g. alien species richness or abundance). Deci¬ 
sion trees were grown with a recursive partitioning algorithm from a 
training set of records, which is known as ‘Top-Down Induction of Deci¬ 
sion Trees’ (Quinlan, 1986). For each step, the most informative input 
variable is selected as the root of the sub-tree and the current training 
set is split into subsets according to the values of the selected input var¬ 
iable. In this way, rules are generated that relate the predictor variables 
(e.g. river morphology) with the response variables (e.g. presence/ 
absence of alien macrocrustaceans). For discrete predictor variables, a 
branch of a tree is typically created for each possible value of that partic¬ 
ular variable. For continuous predictor variables, a threshold is selected 
and two branches are created based on that threshold. Tree construc¬ 
tion ends when the variance of the class values of all examples in a 
node is within a certain range. Such nodes are called leaves and are la¬ 
belled with a regression equation in case of regression trees or with the 
corresponding value of a class in case of classification trees (e.g. presence 
or absence). 

Pruning was performed to prevent trees from over-fitting data 
(Dzeroski and Drumm, 2003) and to make them easily interpretable 
(Dakou et al., 2007). Pruning can be used during tree construction 
(pre-pruning) and/or after the tree has been constructed (post-prun¬ 
ing). Pre-pruning is achieved when a minimum number of instances is 
needed before branching continues. Post-pruning on the other hand, 
implies that by changing the pruning confidence factor (PCF) some of 
the ending sub-trees of a highly branched tree can be replaced by 
leaves. In our case, pre- as well as post-pruning were performed as op¬ 
timisation techniques. 
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Table 1 

Average as well as minimum and maximum values (the range is indicated between 
brackets) of the assessed environmental parameters for the three constructed models: 
habitat preference, species richness and abundance of alien macrocrustaceans (BOD: 
Biological Oxygen Demand; COD: Chemical Oxygen Demand). 


Variable 

Unit 

Habitat 

preference 

Alien species 
richness 

Abundance of 
alien species 

Ammonium 

mg N/L 

1.82 (0.04- 

1.38 (0.06- 

1.49 (0.08- 



48.6) 

10.6) 

10.8) 

BOD 

mg/L 

5.71 (1.0-354) 

4.44 (1.06-30) 

4.70 (1.0-30) 

COD 

mg/L 

34.7 (2.5-680) 

32.9 (8.0-132) 

33.7 (2.5-132) 

Dissolved 

mg/L 

7.0 (0.2-28.3) 

7.07 (1.0- 

7.11 (0.5- 

oxygen 



21.6) 

28.3) 

Conductivity 

pS/cm 

1109(90- 

1612 (90- 

1397(90- 



17,570) 

17,570) 

17,570) 

Kjeldahl 

mg N/L 

3.9 (0.8-163) 

2.66 (0.8-119) 

2.89 (0.80- 

nitrogen 




12.2) 

Nitrate 

mg N/L 

3.3 (0.1-31.4) 

2.7 (0.1-15.7) 

3.1 (0.2-18) 

Nitrite 

mg N/L 

0.16 (0.002- 

0.14 (0.002- 

0.15 (0.002- 



3.0) 

0.87) 

0.87) 

Orthophosphate 

mg P/L 

0.46 (0.003- 

0.38 (0.004- 

0.38 (0.005- 



16.0) 

2.38) 

3.33) 

Total 

mg P/L 

1.06 (0.05- 

0.82 (0.06- 

0.82 (0.05- 

phosphorus 


100) 

6.17) 

6.17) 

PH 


7.7 (6.0-9.4) 

7.7 (6.4-9.2) 

7.7 (6.0-9.4) 

Sinuosity 


1.04 (0.0-1.98) 

1.03 (0.0- 

1.05 (0.0- 




1.74) 

1.89) 

Slope 

m/1000 m 

2.24 (0.0-42.5) 

1.33 (0.0- 

1.67 (0.0- 




20.5) 

20.5) 

Number of ships 


934 (0-32,772) 

2032 (0- 

1840 (0- 




32,772) 

32,772) 

River 

classes 

3(0-10) 

3 (0-10) 

3(0-8) 

morphology 

(0-10) 




Number of alien 

species/ 

0.44 (0-5) 

1.2 (0-5) 

1 (0-5) 

Crustacea 

sample 




Abundance of 

individuals/ 

10 (0-224) 

25 (0-202) 

22 (0-202) 

alien 

sample 




Crustacea 





Shipping 

class (0,1) 

Present 

Present 

Present 



(n = 107); 

(n = 30); 

(n = 51); 



Absent 

Absent 

Absent 



(n = 776) 

(n = 111) 

(n = 222) 

Alien Crustacea 

class (0,1) 

Present 

Present 

Present 



(n = 299); 

(n = 94); 

(n= 182); 



Absent 

Absent 

Absent 



(n = 584) 

(n = 47) 

(n = 91) 


The model training and evaluation was based on a three-fold cross 
validation. The dataset was, after reshuffling, randomly split in three 
subsets: two thirds were used for training and one third for validation. 
For each training and validation set a model was build and in this 
way, a performance value for each of the three different models was 
obtained. Average performance was used as final criterion for model 
evaluation. Model performance was based on the percentage Correctly 
Classified Instances (CCI) and Cohen's Kappa Statistic (K) for classifica¬ 
tion trees and the multiple correlation coefficient (R) for regression 
trees. In order to reach a satisfactory model performance, the CCI should 
be at least 70% and K should be at least 0.4 (Gabriels et al., 2007). For the 
multiple correlation coefficient, the closer the value is to 1, the better 
the model predicts the data (Everaert et al., 2010). 

For the construction of the classification trees, the J48 algorithm was 
applied (Hall et al., 2009), which is a re-implementation of the C4.5 al¬ 
gorithm. Regression trees were built using M5' (Wang and Witten, 
1997), a re-implementation of the M5 algorithm (Quinlan, 1992). For 
both techniques, the standard settings from the machine learning pack¬ 
age WEKA were applied (Witten and Frank, 2005), except for the PCF 
(pruning confidence factor) and the minimum number of instances re¬ 
quired for further splits, which were adapted in order to obtain the most 
optimal model. The most optimal model was defined as a model with a 
good balance between a good technical performance (CCI, I<) on the one 
hand and a high ecological relevance and reduced complexity on the 
other hand. 


In total, 882 samples of the year 2004 from different sampling loca¬ 
tions scattered over surface waters (different water types) in Flanders 
and comprising biological as well as physical-chemical and shipping 
data were used to build the models. All used response and predictor 
variables are listed in Table 1. Three different datasets were compiled, 
which were either used to predict habitat preference, abundance or 
species richness of alien macrocrustaceans. Classification trees, which 
were used to model habitat preference of alien macrocrustaceans can 
deal very well with missing data and outliers (Pham, 2006). Therefore, 
all data (882 instances) were used when applying this technique. When 
using regression trees, missing data and outliers (based on all values ex¬ 
ceeding three times the standard deviation) were removed and the da¬ 
tabase was stratified, since this generally yields more consistent and 
robust performances (Everaert et al., 2010). Stratification implied that 
each possible outcome was represented by the same number of in¬ 
stances in the database. This resulted in total in 273 instances that 
were used for the abundance model and 141 instances that were used 
for the model predicting the species richness of alien macrocrustaceans. 

2.3. Sensitivity analysis 

Sensitivity analysis was done for the regression tree models to de¬ 
termine the weight of each variable in the regression equations as 
well as to check the robustness of the constructed models. For each 
parameter the minimum, maximum and average values were deter¬ 
mined (Table 1 ). Afterwards, the outcome for each of these equations 
of the selected model was calculated by keeping all parameters con¬ 
stant (averages) except for the one which we wanted to analyse the 
sensitivity off, which ranged from its minimum to its maximum 
value. Dividing the maximum by the minimum outcome of the re¬ 
gression equation yielded a factor indicating the importance of each 
parameter. In addition, the effect of conductivity on the species rich¬ 
ness and abundance of alien macrocrustaceans in relation to the other 
parameters was graphically illustrated for each of the three folds by 
keeping all other parameters constant (average values) except for 
conductivity which ranged from low (fresh water conditions) to 
high values (brackish water conditions). 

2.4. Future dispersal 

Predictions on the future prevalence, species richness and abun¬ 
dance of alien macrocrustaceans were made based on an integrated 
modelling approach. The constructed classification and regression 
tree models were combined with predictions on the improvement 
of the chemical water quality (PEGASE water quality model) due to 
the installation of planned wastewater treatment plants (Ronse and 
D’heygere, 2007). With the PEGASE model, physical-chemical data 
were modelled for 3 years: 2006 (reference data), 2015 and 2027, 
according to the deadlines set by the European Union Water Frame¬ 
work Directive (European Union, 2000). Based on previous research 
regarding the distribution of alien macrocrustaceans in Flanders 
(Boets et al., 2011a) and for practical reasons (e.g. due to the fact 
that not for all catchments in Flanders PEGASE data were available), 
we opted to investigate the distribution of alien macrocrustaceans 
in a selected catchment in Flanders (the canal Ghent-Terneuzen 
and its tributaries). Data generated per segment by the water quality 
model on physical-chemical water quality parameters were used as 
input for our habitat suitability models to make predictions on the fu¬ 
ture distribution. We assumed that shipping and number of ships 
remained constant, as we did not have predictive data on the future 
shipping intensity. Conductivity was kept constant as this parameter 
is not included in the PEGASE water quality model and no predictions 
on possible changes could be made. Finally, sinuosity and river mor¬ 
phology were kept constant as well, as these parameters are not 
expected to change in this timeframe. All other physical-chemical 
water quality parameters changed according to the water quality 
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(^conductivity^) 

>1740 iiS/cm/X <1740 \iS/cm 



present (^^shipping^^) 

absent/ \ present 


absent (6rthophosphS^) 

<0.2 mg/L/ \>0.2 mg/L 


present 


absent 


Fig. 1. Classification tree predicting the presence or absence of alien macrocrustaceans in 
surface waters in Flanders (pruning confidence factor=0.25; Correctly Classified 
Instances = 72 ± 4%; Cohen's Kappa = 0.28 ± 0.06). 

model. The final outcome for the different models was calculated and 
afterwards visualised in ArcMap (version 9.3.1). 

3. Results 

3.1. Habitat preference 

The presence or absence of alien macrocrustaceans could be accurate¬ 
ly predicted based on the physical-chemical variables and the informa¬ 
tion regarding shipping. Only this tree that had an acceptable reliability 
in combination with a low complexity and a high ecological relevance 
was selected and presented (Fig. 1). After pruning (PCF= 0.25), a classi¬ 
fication tree with four leaves was constructed. With this tree, 72 ± 4% of 
the instances were correctly classified and I< = 0.28 ± 0.06. The model re¬ 
vealed that alien macrocrustaceans are present at high conductivities 
(>1740 pS/cm), which could be ascribed to brackish waters. If the con¬ 
ductivity was lower than or equalled 1740 pS/cm, other factors deter¬ 
mined whether alien macrocrustaceans were present or not. In fresh 
water, alien macrocrustaceans were present in water with a low conduc¬ 
tivity, where ships passed and where the orthophosphate concentration 
was lower than 0.2 mg/L (Fig. 1). This indicates that under freshwater 
conditions, shipping in combination with a good chemical water quality 
promotes the occurrence of alien macrocrustaceans. 

3.2. Species richness 

Similar to the model for habitat preference, only the model having 
an acceptable reliability in combination with a low complexity and a 
high ecological relevance is given in the results (Fig. 2). The model 
predicting the species richness of alien macrocrustaceans consisted of 
a regression tree with three leaves and an average performance of 
R = 0.59 ± 0.06 (Fig. 2). The species richness of alien macrocrustaceans 
could be predicted as follows: if the number of ships was lower than or 
equal to 192 per year, the linear model 1 (LMl) was used. LM1 


consisted of the variables chemical oxygen demand (COD), total phos¬ 
phorus, conductivity, Kjeldahl nitrogen and slope. According to LMl, 
the number of alien macrocrustacean species present in the surface wa¬ 
ters increased with increasing conductivity. If the number of ships was 
higher than 192 and the conductivity was lower than 456 pS/cm, the 
model LM2 was applied. LM2 used the same variables as LMl and 
they contributed in a similar way to the increase or decrease of alien 
species richness. Finally, if the number of ships was higher than 192 
and the conductivity was higher than 456 pS/cm, the alien species rich¬ 
ness was determined by LM3. LM3 used the same variables as LMl and 
LM2, but in this case, increasing conductivity as well as an increasing 
phosphorus concentration positively contributed to the established 
species richness of alien macrocrustaceans. Based on these regression 
equations, the average alien species richness was calculated for the dif¬ 
ferent linear models. This resulted for LMl (based on 112 sites) in 1.1 
species, for LM2 (based on 5 sites) in 2.7 species and for LM3 (based 
on 21 sites) in 1.9 species. Sensitivity analysis indicated that when in¬ 
tensive ship traffic (>192 ships per year) was present, the alien species 
richness was generally higher compared to low levels of ship traffic 
(<192 ships per year) (Fig. 3A, B). Conductivity was an important vari¬ 
able determining the alien species richness, especially at higher conduc¬ 
tivities, since after a certain threshold (456 pS/cm), the alien species 
richness increased with increasing conductivity (Fig. 3B). The highest 
species richness of alien macrocrustaceans was reached in freshwater 
with a good chemical water quality and intensive ship traffic. Sensitivity 
analysis of the linear regression equations pointed out that Kjeldahl ni¬ 
trogen had a major contribution in determining the alien species rich¬ 
ness. High levels of Kjeldahl nitrogen (>10 mg/L) reduced the species 
richness of alien macrocrustaceans substantially. 

3.3. Abundance 

The most reliable and ecological relevant model yielded a regression 
tree with two leaves and a correlation coefficient (R) of 0.56 ± 0.08, 
which predicted the abundance of alien species in surface waters in 
Flanders (Fig. 4). If the conductivity was lower than or equal to 
1041 pS/cm, the linear model LMl was used. LMl predicts the abun¬ 
dance of alien macrocrustaceans as a function of the variables ammoni¬ 
um, COD, conductivity, pH and the number of ships. Ammonium and 
conductivity have a negative influence on the abundance, whereas the 
abundance increases with increasing COD, the number of ships and 
pH. The average number of individuals calculated based on the 209 
sites to which the equation could be applied was nine. In fresh waters, 
conductivity and ammonium negatively affect the abundance, which 
indicates that with increasing nutrient content, the abundance de¬ 
creased. If the conductivity was higher than 1041 pS/cm, the linear 
model LM2 was used. LM2 used the variables ammonium, COD, conduc¬ 
tivity, Kjeldahl nitrogen, orthophosphate, pH and number of ships to 
determine the abundance of alien macrocrustaceans. Only ammonium 
and Kjeldahl nitrogen negatively contributed to the abundance, where¬ 
as for all other predictor variables, the abundance increased when some 



Fig. 2. Regression tree with regression equations predicting the species richness of alien macrocrustaceans in surface waters in Flanders (minimum number of instances = 4; cor¬ 
relation coefficient = 0.59 ± 0.06). 
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A) -foldl -fold 2 .fold 3 




Conductivity (pS/cm) 

Fig. 3. Sensitivity analysis illustrating the effect of changing conductivity on the species 
richness of alien macrocrustaceans (A) at low (<192 ships per year) and (B) high levels 
of ship traffic (>192 ships per year) for all three folds. 

or all of these variables increased. LM2 was applied to 64 sites with on 
average 64 individuals. This indicates that alien macrocrustaceans 
could reach higher abundances in brackish waters (high conductivity) 
and that high levels of orthophosphate do not necessarily negatively in¬ 
fluence the abundance. In fresh water, increasing conductivity leads to a 
decrease in abundance of alien macrocrustaceans, whereas in brackish 
water increasing conductivity influenced the abundance positively 
(Fig. 5). Sensitivity analysis revealed that the regression equation of 
freshwater (LM1) was mostly influenced by the number of ships, 
whereas for brackish water (LM2) Kjeldahl nitrogen exerted an impor¬ 
tant negative effect on the abundance of alien macrocrustaceans. 

3.4. Future dispersal 

Based on our integrated model, we made predictions on the future 
prevalence, alien species richness and abundance of alien macrocrusta¬ 
ceans for the canal Ghent-Terneuzen and its tributaries. The model pre¬ 
dicts an increase in prevalence of alien macrocrustaceans of 10% by the 
year 2027 (Fig. 6). The small canals around the city of Ghent as well as 
a tributary in the east are likely to be colonised by alien macrocrustaceans. 
There is also an increase in predicted alien species richness over the years 
(Fig. 7). Compared to the reference situation of2006, the upper part of the 
canal Ghent-Terneuzen shows an increase in modelled alien species 


richness from on average one to on average two alien species by the 
year 2027. Although there is a general increase in the predicted preva¬ 
lence and alien species richness, the abundance is predicted to decrease 
over the years. The predicted abundance in the canal Ghent-Terneuzen 
lies between 11 and 100 alien macrocrustacean individuals per sample 
in the year 2027 (Fig. 8). The newly colonised tributary in the east con¬ 
tains a very low predicted abundance of alien macrocrustaceans, with 
on average only one individual per sample. 

4. Discussion 

4.1. Habitat modelling 

Habitat suitability models can be applied to predict the potential 
distribution of alien species and to reveal their ecological niche pref¬ 
erences (Drake and Bossenbroek, 2009; Peterson, 2003; Pitt et al., 
2009). These models can identify habitats at risk of invasion, which 
can help subsequent management efforts to maximize the efficacy 
of preventive measures to stop the spread of alien invasive species. 
Although these models are not 100% accurate in their predictions, 
they offer information regarding species preferences and their poten¬ 
tial for invasion (Ba et al., 2010; Boets et al., 2010). Our developed 
habitat suitability models revealed that alien macrocrustaceans espe¬ 
cially occur in brackish waters or freshwater with intensive ship traf¬ 
fic and low nutrient levels. Both salinity and shipping are known to be 
important parameters influencing the establishment and spread of 
alien macroinvertebrates (Ba et al., 2010; Bij de Vaate et al., 2002; 
Everaert et al., 2011; Grabowski et al., 2009). 

Brackish waters are typically characterised by a low density and diver¬ 
sity of native species, hence it is easier for alien species to establish 
(Remane, 1958; Wolff, 1999). These waters with unsaturated ecological 
niches have a high potential to be invaded by alien macroinvertebrates 
(Paavola et al., 2005). Many alien macrocrustaceans are tolerant towards 
high salinities and therefore, they can easily establish in brackish water 
(Grabowski et al., 2007). In addition, it is assumed that brackish water 
species have a better chance of being transported alive than marine or 
freshwater species (Wolff, 1999). Grabowski et al. (2009) found that in 
the two largest rivers in the Baltic basin in Poland, alien amphipods 
were mostly found at conductivities above 800 pS/cm, where they 
reached high population densities. Moreover, brackish waters are sub¬ 
jected to a two sided ‘invasion pressure’, since species introduced in the 
freshwater as well as in the marine environment have the opportunity 
to migrate to these brackish waters (Nehring, 2006). Indigenous species 
inhabiting rivers have an increased conductivity risk to be displaced by 
alien ones, since alien species often have a competitive advantage over 
native species in polluted rivers (Grabowski et al., 2009). Organic dis¬ 
charges should therefore be minimized at all times, since these could 
ease the further spread of alien species (Grabowski et al., 2009). 

Shipping is recognised as the most important vector of aquatic alien 
species introductions to Europe (Bij de Vaate et al., 2002; Gollasch, 
2006). Improved ship design allows larger and faster ships, resulting 
in more frequent ship arrivals and larger amounts of ballast water 
being released. The construction of faster ships resulted in shorter 
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Fig. 4. Regression tree with regression equations predicting the abundance of alien macrocrustaceans in surface waters in Flanders (minimum number of instances = 4; correlation 
coefficient = 0.56 ± 0.08). 
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Fig. 5. Sensitivity analysis illustrating the effect of changing conductivity on the abun¬ 
dance of alien macrocrustaceans for all three folds. 


voyages and consequently improved survival of alien species (Gollasch, 
2006). Harbours and rivers with intensive ship traffic can be seen as 
hotspots for species introductions. Once an alien species is introduced 
into its new habitat, secondary dispersal via ballast water or hull fouling 
of ships can account for a substantial part of alien species dispersal (Bij 
de Vaate et al., 2002). 

The model predicting the species richness of alien macrocrustaceans 
indicated that intensive ship traffic in combination with conductivity 
was the main factor determining a high alien species richness. As men¬ 
tioned earlier, shipping is a key vector of species introductions (Bij de 
Vaate et al„ 2002). Shipping activity is seen as an important proxy vari¬ 
able of propagule pressure (Ricciardi, 2006). The ‘propagule pressure’ 
concept focuses on the number of invading propagules for a given intro¬ 
duction and the frequency with which they are introduced (Williamson 
and Fitter, 1996). The more ships pass in a water body, the higher the 
chance that alien species become established and the higher the chance 
of an increased alien species richness. This phenomenon was also ob¬ 
served in the river Rhine, where the number of alien taxa decreased up¬ 
stream with decreasing cargo transport (Wirth et al., 2010). Shipping is 
most intensive in human modified river ecosystems (e.g. large rivers 
and canals) with artificial or semi-natural embankments. These habitat 
conditions are often very attractive for alien species to establish and to 
become dominant. Boets et al. (2010) found that the alien invasive spe¬ 
cies D. villosus thrives very well in canals with artificial concrete river- 
banks. Many alien macroinvertebrates can easily colonise and establish 
stable populations on these hard substrates (Van Riel et al., 2006) and 
are therefore preferred habitats. 

Besides the possibility to reach a new habitat and become established, 
favourable environmental conditions are important to build up viable 
populations. The abundance of alien macrocrustaceans was mainly deter¬ 
mined by conductivity. The average abundance calculated based on the 
linear regression equations was lower in freshwater (conductivity< 
1041 pS/cm) compared to brackish water (conductivity>1041 pS/cm). 
It was found that in urban and densely populated areas, where high 
amounts of nutrients end up in river systems, the abundance of native 
species can be reduced and that of alien species increased (Vermonden 
et al., 2010). At their initial introduction stage, alien species can often 
have a competitive advantage at high nutrient concentrations (contribut¬ 
ing to a low water quality) compared to indigenous species (Grabowski 
et al., 2007; Strayer, 2010). With increasing chemical as well as biological 
water quality, indigenous species might again be able to compete with 
alien species. Leuven et al. (2009) found that in urban waters in the 
Netherlands, indigenous macroinvertebrates were able to coexist and 
even dominate alien species in nutrient-poor, densely vegetated sys¬ 
tems. However, our models indicate that also alien species, at least 
under freshwater conditions, can benefit from an improving water qual¬ 
ity and that high nutrient concentrations can negatively affect the alien 
species richness. The high abundances detected at higher conductivities 
could be attributed to the presence of species like Gammarus tigrinus, 



Fig. 6. Modelled prevalence of alien macrocrustaceans for the year 2006, 2015 and 
2027, based on a water quality model for the canal Ghent-Terneuzen and its major 
tributaries (white = absent, black = present). 

which especially occurs in brackish waters in Flanders, where it can 
reach high abundances (Boets et al., 2011b). This species shows a 
wide tolerance towards low or high levels of salinity, but is generally 
present in waters with a conductivity between 1200 and 3200 pS/cm. 
The high abundances of alien macrocrustaceans found at high conduc¬ 
tivities could be ascribed to the high abundance of G. tigrinus present 
in these brackish waters. 
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Fig. 7. Modelled species richness of alien macrocrustaceans for the year 2006,2015 and 
2027 based on a water quality model for the canal Ghent-Terneuzen and its major trib¬ 
utaries (white = no species, red = 1 species, black = 2 species). 


4.2. Integrated modelling 

Integrated models are often based on a combination of environmen¬ 
tal and climatic conditions (e.g. Ficetola et al., 2007; Gallien et al., 2010). 
Habitats that meet these environmental and climatic constraints are 


identified as vulnerable for invasions. In this paper, a somewhat differ¬ 
ent approach was used: a habitat suitability model was integrated with 
a predictive water quality model. Based on such an integrated model, it 
is expected that by the year 2027, there will be an increase in preva¬ 
lence and species richness of alien macrocrustaceans, but the abun¬ 
dance is predicted to decrease at some locations and to remain stable 
between 10 and 100 individuals per sample. At an initial stage, alien 
macrocrustaceans can have a competitive advantage in habitats 
with a low water quality and high nutrient levels. However, alien 
macrocrustaceans can also benefit from an improvement in the chemi¬ 
cal water quality, especially in those watercourses that evolve from a 
bad to a moderate water quality. Due to the installation of wastewater 
treatment plants in Flanders, the chemical water quality is predicted 
to improve during the coining decades and therefore, our integrated 
model can give more accurate predictions on the future distribution of 
alien macrocrustaceans compared to simple habitat suitability models. 
Via this integrated model, valuable insight in the future potential inva¬ 
sive range of alien macrocrustaceans is given. 

It has been suggested to incorporate species migration, population 
dynamics, biotic interactions and community ecology into species dis¬ 
tribution models at multiple spatial scales (Guisan and Thuiller, 2005). 
In the next step, our integrated model could be optimized by including 
also dispersal of alien macrocrustaceans via a migration model. In our 
current model, species can only be present if an appropriate vector 
(shipping) is present, which imposes some limitations. The fact that 
species can actively colonise new areas and the time needed for this 
could be included in future models and give more accurate predictions. 
Recently, Gallardo et al. (2012) combined a large scale bioclimatic 
model with a local-scale migration model to predict the future distribu¬ 
tion of D. villosus in Great Britain. Based on different scenarios of the an¬ 
nual migration speed, they were able to predict the dispersal of this 
alien macrocrustacean within the Great Ouse River catchment. They 
concluded that this approach helps to prevent and control the spread 
of alien invasive species and consequently can provide managers with 
a powerful spatial and temporal basis for informed decision-making. 

4.3. Model performance 

The performance of the habitat preference model was fair to moder¬ 
ate according to Gabriels et al. (2007). The variation on the different folds 
that were used was limited, which is an indication of the robustness of 
the constructed models. The fact that the performance was not veiy 
high could be due to some factors inherent to alien species. First of all, 
alien species may not yet have spread to all suitable habitats, making it 
difficult to determine species-environment relationships (Stohlgren et 
al., 2010). Secondly, alien species are often characterised as very oppor¬ 
tunistic species, being able to easily cope with changes in environmental 
conditions (Nehring, 2006; Williamson and Fitter, 1996). Alien species 
can be seen as generalists, invading those niches which are available. 
Most alien species are omnivores and consequently, they do not pose 
any specific requirements regarding food availability (Strayer, 2010). 
All these elements make it difficult to accurately predict the habitat suit¬ 
ability and the distribution range of alien macrocrustaceans. Araujo and 
Guisan (2006) suggest that evaluation strategies should be discussed in 
the context of three possible uses: description, understanding and pre¬ 
diction. Complexity of model evaluation increases from explanation to 
prediction to the point where models that simply seek to describe a 
given pattern may not need to be evaluated, whereas the evaluation of 
models aiming at prediction is desirable but not always conceptually 
possible. Even though the accuracies of the models were not veiy high, 
Dzeroski and Drumm (2003) state that, when using these techniques, 
we should bear in mind that the primary goal of such an analysis is to 
pinpoint the essential site characteristics rather than to predict the 
exact number of species. A major difficulty in applying data mining tech¬ 
niques for modelling alien invasive species is related to settings selection 
to obtain the most optimal model. The performance of models can be 
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Fig. 8. Modelled abundance of alien macrocrustaceans for the year 2006, 2015 and 
2027 based on a water quality model for the canal Ghent-Terneuzen and its major trib¬ 
utaries, Abundances were divided in several classes: white = absent, red = 1 individ¬ 
ual, blue = 2-10 individuals, green = 11-100 individuals and black = 101-1000 
individuals. 


assessed from different perspectives, usually accounting for technical re¬ 
liability, ecological relevance and user convenience. However, it is diffi¬ 
cult to find a balance between these criteria, which are moreover to 
some extend both synergistic and antagonistic. Consequently, to compare 


and select optimal settings, there is a need for frameworks that can guide 
model developers in this selection, based on the specific characteristics of 
the data as well as the needs and interest of the model users (Willems, 
2010). Ensemble forecasting models (Araujo and New, 2007; Stohlgren 
et al., 2010), using presence as well as absence data (Phillips et al., 
2009), incorporating dispersal using estimates of dispersal rates 
(Midgley et al., 2006) or developing spatially-explicit species distribution 
models (Harris et al., 2009; Iverson et al., 2009; Smollik et al., 2010) have 
been suggested to overcome the shortcomings of traditional modelling 
techniques. Nevertheless, we can conclude that our models are useful 
and understandable for determining those environmental parameters 
and conditions that are important for alien macrocrustaceans to establish 
and become dominant. These models could be used by decision makers 
to pinpoint those regions within the aquatic environment that are 
under severe threat of invasion by alien species. 

4.4. Conclusion 

Alien macrocrustaceans have a preference for waters with a high 
conductivity. Shipping in combination with a good chemical water 
quality promotes the occurrence of alien macrocrustaceans. A maxi¬ 
mum species richness of alien macrocrustaceans was reached in fresh¬ 
water with a good chemical water quality and intensive ship traffic. In 
fresh water, increasing conductivity leads to a decrease in abundance 
of alien macrocrustaceans, whereas in brackish water increasing con¬ 
ductivity influenced the abundance positively. The predictions based 
on our integrated model approach indicated that the prevalence and 
the species richness of alien macrocrustaceans are likely to increase 
with improving chemical water quality, whereas their abundance will 
probably decrease slightly. From this study, it is clear that models are 
a useful tool for decision making and that policy makers should focus 
on vulnerable areas such as brackish water areas and areas with inten¬ 
sive ship traffic in order to prevent the further introduction and spread 
of alien species. 
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